Learning to classify documents according to genre

نویسندگان

  • Aidan Finn
  • Nicholas Kushmerick
چکیده

Current document retrieval tools succeed in locating large numbers of documents relevant to a given query. While search results may be relevant according to the topic of the documents, it is more difficult to identify which of the relevant documents are most suitable for a particular user. Automatic genre analysis that is, the ability to distinguish documents according to style would be a useful tool for identifying documents that are most suitable for a particular user. We investigate the use of machine learning for automatic genre classification. We introduce the idea of domain transfer genre classifiers should be reusable across multiple topics which doesn’t arise in standard text classification. We investigate different features for building genre classifiers and their ability to transfer across multiple topic domains. We also show how different feature-sets can be used in conjunction with each other to improve performance and reduce the number of documents that

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multiple sets of features for automatic genre classification of web documents

With the increase of information on the Web, it is difficult to find desired information quickly out of the documents retrieved by a search engine. One way to solve this problem is to classify web documents according to various criteria. Most document classification has been focused on a subject or a topic of a document. A genre or a style is another view of a document different from a subject ...

متن کامل

User Assessment of a Visual Web Genre Classifier

Users assess the “appropriateness” of web documents in many ways. Traditionally, appropriateness has been solely a matter of relevance to a particular topic. But users are concerned with other aspects of document “genre”, such as the level of expertise assumed by the author, or the amount of detail. In previous work, we have used machine learning to automatically classify documents along a vari...

متن کامل

Adjectives and Adverbs as Indicators of Affective Language for Automatic Genre Detection

We report the results of a systematic study of the feasibility of automatically classifying documents by genre using adjectives and adverbs as indicators of affective language. In addition to the class of adjectives and adverbs, we focus on two specific subsets of adjectives and adverbs: (1) trait adjectives, used by psychologists to assess human personality traits, and (2) speaker-oriented adv...

متن کامل

Combining classifiers for flexible genre categorization of web pages

With the increase of the number of web pages, it is very difficult to find wanted information easily and quickly out of thousands of web pages retrieved by a search engine. To solve this problem, many researches propose to classify documents according to their genre, which is another criteria to classify documents different from the topic. Most of these works assign a document to only one genre...

متن کامل

Internet Genres

Rhetoricians since Aristotle have attempted to classify communications or documents into categories or “genres” with similar form, topic or purpose. This article surveys research on genre as it relates to Internet documents. The article briefly presents the concept of genre in general, and then reviews the evolution and emergence of genres on the Internet. It concludes with an examination of th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • JASIST

دوره 57  شماره 

صفحات  -

تاریخ انتشار 2006